Skip to content

perf(hadoop): Use precompiled/hadoop for faster image builds#1472

Open
NickLarsenNZ wants to merge 5 commits intomainfrom
chore/faster-hadoop-builds
Open

perf(hadoop): Use precompiled/hadoop for faster image builds#1472
NickLarsenNZ wants to merge 5 commits intomainfrom
chore/faster-hadoop-builds

Conversation

@NickLarsenNZ
Copy link
Copy Markdown
Member

Part of #1465

The precompiled/hadoop was done in #1466

@NickLarsenNZ NickLarsenNZ self-assigned this Apr 17, 2026
@NickLarsenNZ NickLarsenNZ moved this to Development: In Progress in Stackable Engineering Apr 17, 2026
@NickLarsenNZ NickLarsenNZ force-pushed the chore/faster-hadoop-builds branch from e089749 to 3caea9c Compare April 17, 2026 10:04
@NickLarsenNZ
Copy link
Copy Markdown
Member Author

NickLarsenNZ commented Apr 20, 2026

Waiting on #1474

Note: this commit is being used to illustrate an issue where the timestamp appears as the version in the HDFS web UI instead of the SDP release
@NickLarsenNZ
Copy link
Copy Markdown
Member Author

NickLarsenNZ commented Apr 21, 2026

Caution

BLOCKER

We've run into an issue. It is mostly a cosmetic thing (albeit important to customers), but we also need to consider what this means in terms of SBOMs and lineage.

Note

I think this didn't appear on my initial PoC because I happened to use 0.0.0-dev until I thought of using a timestamp.

As part of the faster builds work, we compile a version+patchset of some code (eg: what was hadoop/hadoop, but moving to precompiled/hadoop - both of which exist on main). The precompiled/* images aren't tied to a release, but instead use a timestamp.

This requires some changes (see: 0e79589), which leads to the timestamped version in the HDFS web UI.

image

@lfrancke found where in the code the web UI gets the version string from.
You can use these steps to check for yourself.

# Run as root so we can install unzip
docker run --rm -it --user 0 oci.stackable.tech/precompiled/hadoop:3.4.2-stackable1776755865
microdnf update && microdnf install unzip
cd /stackable/hadoop-3.4.2-stackable1776755865/share/hadoop/common
unzip hadoop-common-3.4.2-stackable1776755865.jar
cat common-version-info.properties

common-version-info.properties contains details used in the web UI:

# ...
version=3.4.2-stackable1776755865
revision=08a7206a29f212e1f0e3bd81e0cb0be7907907a4
branch=patchable/3.4.2
user=stackable
date=2026-04-21T07:38Z
url=Unknown
srcChecksum=e573e429c0e2b636ead65e1a9b8bd46a
protocVersion=3.23.4
compilePlatform=linux-x86_64

We could add a light build stage to the product image that extracts the file, updates the version string, and archives it again.

image

This needs further discussion with @lfrancke, @StefanFl, @dervoeti.
Until then, I will stop any work on this.

@NickLarsenNZ
Copy link
Copy Markdown
Member Author

NickLarsenNZ commented Apr 21, 2026

Just noting that tests pass at 0e79589

Details
cargo boil build hadoop=3.4.2 --strip-architecture
# Yes, I forgot to tell boil to do this
docker tag oci.stackable.tech/sdp/hadoop:3.4.2-stackable0.0.0-dev \
  localboi/hadoop:3.4.2-stackable0.0.0-dev
--- PASS: kuttl (3564.94s)
    --- PASS: kuttl/harness (0.00s)
        --- PASS: kuttl/harness/orphaned-resources_hadoop-latest-3.4.2,localboi_hadoop_3.4.2-stackable0.0.0-dev_zookeeper-latest-3.9.4_openshift-false (139.47s)
        --- PASS: kuttl/harness/kerberos_hadoop-3.4.2,localboi_hadoop_3.4.2-stackable0.0.0-dev_zookeeper-latest-3.9.4_krb5-1.21.1_opa-1.12.3_kerberos-realm-CLUSTER.LOCAL_kerberos-backend-mit_openshift-false (1471.72s)
        --- PASS: kuttl/harness/kerberos_hadoop-3.4.2,localboi_hadoop_3.4.2-stackable0.0.0-dev_zookeeper-latest-3.9.4_krb5-1.21.1_opa-1.12.3_kerberos-realm-PROD.MYCORP_kerberos-backend-mit_openshift-false (1502.26s)
        --- PASS: kuttl/harness/smoke_hadoop-3.4.2,localboi_hadoop_3.4.2-stackable0.0.0-dev_zookeeper-3.9.4_zookeeper-latest-3.9.4_number-of-datanodes-2_datanode-pvcs-2hdd-1ssd_listener-class-external-unstable_openshift-false (251.15s)
        --- PASS: kuttl/harness/cluster-operation_hadoop-latest-3.4.2,localboi_hadoop_3.4.2-stackable0.0.0-dev_zookeeper-latest-3.9.4_openshift-false (245.91s)
        --- PASS: kuttl/harness/topology-provider_hadoop-latest-3.4.2,localboi_hadoop_3.4.2-stackable0.0.0-dev_zookeeper-latest-3.9.4_krb5-1.21.1_kerberos-backend-mit_openshift-false (361.85s)
        --- PASS: kuttl/harness/smoke_hadoop-3.4.2,localboi_hadoop_3.4.2-stackable0.0.0-dev_zookeeper-3.9.4_zookeeper-latest-3.9.4_number-of-datanodes-2_datanode-pvcs-default_listener-class-external-unstable_openshift-false (241.20s)
        --- PASS: kuttl/harness/smoke_hadoop-3.4.2,localboi_hadoop_3.4.2-stackable0.0.0-dev_zookeeper-3.9.4_zookeeper-latest-3.9.4_number-of-datanodes-2_datanode-pvcs-default_listener-class-cluster-internal_openshift-false (228.57s)
        --- PASS: kuttl/harness/smoke_hadoop-3.4.2,localboi_hadoop_3.4.2-stackable0.0.0-dev_zookeeper-3.9.4_zookeeper-latest-3.9.4_number-of-datanodes-1_datanode-pvcs-default_listener-class-cluster-internal_openshift-false (167.15s)
        --- PASS: kuttl/harness/logging_hadoop-3.4.2,localboi_hadoop_3.4.2-stackable0.0.0-dev_zookeeper-latest-3.9.4_openshift-false (680.84s)
        --- PASS: kuttl/harness/smoke_hadoop-3.4.2,localboi_hadoop_3.4.2-stackable0.0.0-dev_zookeeper-3.9.4_zookeeper-latest-3.9.4_number-of-datanodes-1_datanode-pvcs-default_listener-class-external-unstable_openshift-false (155.63s)
        --- PASS: kuttl/harness/smoke_hadoop-3.4.2,localboi_hadoop_3.4.2-stackable0.0.0-dev_zookeeper-3.9.4_zookeeper-latest-3.9.4_number-of-datanodes-1_datanode-pvcs-2hdd-1ssd_listener-class-cluster-internal_openshift-false (170.71s)
        --- PASS: kuttl/harness/smoke_hadoop-3.4.2,localboi_hadoop_3.4.2-stackable0.0.0-dev_zookeeper-3.9.4_zookeeper-latest-3.9.4_number-of-datanodes-1_datanode-pvcs-2hdd-1ssd_listener-class-external-unstable_openshift-false (148.68s)
        --- PASS: kuttl/harness/profiling_hadoop-3.4.2,localboi_hadoop_3.4.2-stackable0.0.0-dev_zookeeper-latest-3.9.4_openshift-false (200.01s)
        --- PASS: kuttl/harness/smoke_hadoop-3.4.2,localboi_hadoop_3.4.2-stackable0.0.0-dev_zookeeper-3.9.4_zookeeper-latest-3.9.4_number-of-datanodes-2_datanode-pvcs-2hdd-1ssd_listener-class-cluster-internal_openshift-false (924.41s)
PASS

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Development: In Progress

Development

Successfully merging this pull request may close these issues.

1 participant